A Primer on Provenance Better understanding data requires tracking its history and context

نویسندگان

  • Lucian Carata
  • Sherif Akoush
  • Nikilesh Balakrishnan
  • Thomas Bytheway
  • Ripduman Sohan
  • Margo Seltzer
  • Andy Hopper
چکیده

Assessing the quality or validity of a piece of data is not usually done in isolation. You typically examine the context in which the data appears and try to determine its original sources or review the process through which it was created. This is not so straightforward when dealing with digital data, however: the result of a computation might have been derived from numerous sources and by applying complex successive transformations, possibly over long periods of time. As the quantity of data that contributes to a particular result increases, keeping track of how different sources and transformations are related to each other becomes more difficult. This constrains the ability to answer questions regarding a result’s history, such as: What were the underlying assumptions on which the result is based? Under what conditions does it remain valid? What other results were derived from the same data sources? The metadata that needs to be systematically captured to answer those (or similar) questions is called provenance (or lineage) and refers to a graph describing the relationships among all the elements (sources, processing steps, contextual information and dependencies) that contributed to the existence of a piece of data. This article presents current research in this field from a practical perspective, discussing not only existing systems and the fundamental concepts needed for using them in applications today, but also future challenges and opportunities. A number of use cases illustrate how provenance might be useful in practice.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Provenance and Annotations for Linked Data

Provenance tracking for Linked Data requires the identification of Linked Data resources. Annotating Linked Data on the level of single statements requires the identification of these statements. The concept of a Provenance Context is introduced as the basis for a consistent data model for Linked Data that incorporates current best-practices and creates identity for every published Linked Datas...

متن کامل

Tracking provenance of earth science data

Tremendous volumes of data have been captured, archived and analyzed. Sensors, algorithms and processing systems for transforming and analyzing the data are evolving over time. Web Portals and Services can create transient data sets on-demand. Data are transferred from organization to organization with additional transformations at every stage. Provenance in this context refers to the source of...

متن کامل

Augmenting geospatial data provenance through metadata tracking in geospatial service chaining

In a service-oriented environment, heterogeneous data from distributed data archiving centers and various geo-processing services are chained together dynamically to generate on-demand data products. Creating an executable service chain requires detailed specification of metadata for data sets and service instances. Using metadata tracking, semantics-enabled metadata are generated and propagate...

متن کامل

Learning Declarative Models from Ontology Alignment Provenance Data

Ontology matching is a process applied in several scenarios in order to reduce semantic heterogeneity by trying to find an alignment among different ontologies. There are in the literature several techniques proposed to improve the results of an ontology matching process, and this research field is so fruitful that there is even a well-established initiative – OAEI – that organizes annual compe...

متن کامل

Contextual Information: Lenses for Observing the Data Universe

To facilitate the reuse of existing data requires a better understanding of their context. Instead of focusing on dataset-specific metadata and provenance records alone, we propose to explore the broader, often implicit contextual information that is formed by viewing data as an interconnected system.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014